13 research outputs found

    Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

    Get PDF
    We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.Comment: For Supporting Information, see PLoS website: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.100259

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Efficient coding in human auditory perception.

    No full text
    Natural sounds possess characteristic statistical regularities. Recent research suggests that mammalian auditory processing maximizes information about these regularities in its internal representation while minimizing encoding cost [Smith, E. C. and Lewicki, M. S. (2006). Nature (London) 439, 978-982]. Evidence for this "efficient coding hypothesis" comes largely from neurophysiology and theoretical modeling [Olshausen, B. A., and Field, D. (2004). Curr. Opin. Neurobiol. 14, 481-487; DeWeese, M., et al. (2003). J. Neurosci. 23, 7940-7949; Klein, D. J., et al. (2003). EURASIP J. Appl. Signal Process. 7, 659-667]. The present research provides behavioral evidence for efficient coding in human auditory perception using six-channel noise-vocoded speech, which drastically limits spectral information and degrades recognition accuracy. Two experiments compared recognition accuracy of vocoder speech created using theoretically-motivated, efficient coding filterbanks derived from the statistical regularities of speech against recognition using standard cochleotopic (logarithmic) or linear filterbanks. Recognition of the speech created using efficient encoding filterbanks was significantly more accurate than either of the other classes. These findings suggest potential applications to cochlear implant design.</p

    Model comparisons to receptive fields from auditory midbrain.

    No full text
    <p>Complete and overcomplete sparse coding models trained on spectrograms of speech predict Inferior Colliculus (IC) spectro-temporal receptive field (STRF) shapes with excitatory and suppressive subfields that are localized in frequency but separated in time. (<b>a</b>) Two examples of Gerbil IC neural STRFs <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594-Lesica1" target="_blank">[31]</a> exhibiting ON-type response patterns with excitation following suppression; data courtesy of N.A. Lesica. (<b>b</b>) Representative model dictionary elements from each of three dictionaries that match this pattern of excitation and suppression. The three dictionaries were all trained on spectrogram representations of speech, using a hard sparseness (L0) penalty; the representations were complete (left column; <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594.s002" target="_blank">Fig. S2</a></b>), two-times overcomplete (middle column; <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594.s003" target="_blank">Fig. S3</a></b>), and four-times overcomplete (right column; <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi-1002594-g004" target="_blank"><b>Fig. 4</b></a> and <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594.s004" target="_blank">Fig. S4</a></b>). (<b>c</b>) Two example neuronal STRFs from cat IC <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594-Rodrguez1" target="_blank">[30]</a> exhibiting OFF-type patterns with excitation preceding suppression; data courtesy of M.A. Escabí. (<b>d</b>) Other model neurons from the same set of three dictionaries as in panel <b>b</b> also exhibit this OFF-type pattern.</p

    A half-complete, L0-sparse dictionary trained on spectrograms of speech.

    No full text
    <p>This dictionary exhibits a variety of distinct shapes that capture several classes of acoustic features present in speech and other natural sounds. (<b>a–f</b>) Selected elements from the dictionary that are representative of different types of receptive fields: (<b>a</b>) a harmonic stack; (<b>b</b>) an onset element; (<b>c</b>) a harmonic stack with flanking suppression; (<b>d</b>) a more localized onset/termination element; (<b>e</b>) a formant; (<b>f</b>) a tight checkerboard pattern (see <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594.s001" target="_blank">Fig. S1</a></b> for the full dictionary). Each rectangle represents the spectro-temporal receptive field (STRF) of a single element in the dictionary; time is plotted along the horizontal axis (from 0 to 216 msec) and log frequency is plotted along the vertical axis, with frequencies ranging from 100 Hz to 4000 Hz. (<b>g</b>) A graph of the usage of the dictionary elements showing that the different types of receptive field shapes separate based on usage into a series of rises and plateaus; red symbols indicate where each of the examples from panels <b>a–f</b> fall on the graph. The vertical axis represents the number of stimuli that required a given dictionary element in order to be represented accurately during inference.</p

    Schematic illustration of our sparse coding model.

    No full text
    <p>(<b>a</b>) Stimuli used to train the model consisted of examples of recorded speech. The blue curve represents the raw sound pressure waveform of a woman saying, “The north wind and the sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak.” (<b>b</b>) The raw waveforms were first put through one of two preprocessing steps meant to model the earliest stages of auditory processing to produce either a spectrogram or a “cochleogram” (not shown; see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#s4" target="_blank">Methods</a> for details). In either case, the power spectrum across acoustic frequencies is displayed as a function of time, with warmer colors indicating high power content and cooler colors indicating low power. (<b>c</b>) The spectrograms were then divided into overlapping 216 ms segments. (<b>d</b>) Subsequently, principal components analysis (PCA) was used to project each segment onto the space of the first two hundred principal components (first ten shown), in order to reduce the dimensionality of the data to make it tractable for further analysis while retaining its basic structure <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594-Klein1" target="_blank">[17]</a>. (<b>e</b>) These projections were then input to a sparse coding network in order to learn a “dictionary” of basis elements analogous to neuronal receptive fields, which can then be used to form a representation of any given stimulus (<i>i.e.</i>, to perform inference). We explored networks capable of learning either “hard” (L0) sparse dictionaries or “soft” (L1) sparse dictionaries (described in the text and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#s4" target="_blank">Methods</a>) that were undercomplete (fewer dictionary elements than PCA components), complete (equal number of dictionary elements), or over-complete (greater number of dictionary elements).</p

    A half-complete sparse coding dictionary trained on cochleogram representations of speech.

    No full text
    <p>This dictionary exhibits a limited range of shapes. The full set of 100 elements from a half-complete, L0-sparse dictionary trained on cochleograms of human speech resemble those found in a previous study <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594-Klein1" target="_blank">[17]</a>. Nearly all elements are extremely smooth, with most consisting of a single frequency subfield or an unmodulated harmonic stack. Each rectangle can be thought of as representing the spectro-temporal receptive field (STRF) of a single element in the dictionary (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#s4" target="_blank">Methods</a> for details); time is plotted along the horizontal axis (from 0 to 250 ms), and log frequency is plotted along the vertical axis, with frequencies ranging from 73 Hz to 7630 Hz. Color indicates the amount of power present at each frequency at each moment in time, with warm colors representing high power and cool colors representing low power. Each element has been normalized to have unit Euclidean length. Elements are arranged in order of their usage during inference (<i>i.e.</i>, when used to represent individual sounds drawn from the training set) with usage increasing from left to right along each row, and all elements of lower rows used more than those of higher rows.</p

    A four-times overcomplete, L0-sparse dictionary trained on speech spectrograms.

    No full text
    <p>This dictionary shows a greater diversity of shapes than the undercomplete dictionaries. (<b>a–l</b>) Representative elements <b>a</b>, <b>c</b>, <b>e</b>, <b>g</b>, <b>j</b>, and <b>l</b> resemble those of the half-complete dictionary (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi-1002594-g003" target="_blank"><b>Fig. 3</b></a>). Other neurons display more complex shapes than those found in less overcomplete dictionaries: (<b>b</b>) a harmonic stack with flanking suppressive subregions; (<b>d</b>) a neuron sensitive to lower frequencies; (<b>f</b>) a short harmonic stack; (<b>h</b>) a localized but complex pattern of excitation with flanking suppression; (<b>i</b>) a localized checkerboard with larger excitatory and suppressive subregions than those in panel <b>l</b>; (<b>k</b>) a checkerboard pattern that extends for many cycles in time. Several of these patterns resemble neural spectro-temporal receptive fields (STRFs) reported in various stages of the auditory pathway that have not been predicted by previous theoretical models (see text and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi-1002594-g006" target="_blank"><b>Figs. 6</b></a><b>–</b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi-1002594-g008" target="_blank"><b>8</b></a>). (<b>m</b>) A graph of usage of the dictionary elements during inference. The different classes of dictionary elements still separate according to usage (see <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi.1002594.s004" target="_blank">Fig. S4</a></b> for the full dictionary) although the notable rises and plateaus as seen in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002594#pcbi-1002594-g003" target="_blank"><b>Fig. 3g</b></a> are less apparent in this larger dictionary.</p

    Efficient coding in human auditory perception

    No full text
    Natural sounds possess characteristic statistical regularities. Recent research suggests that mammalian auditory processing maximizes information about these regularities in its internal representation while minimizing encoding cost [Smith, E. C. and Lewicki, M. S. (2006). Nature (London) 439, 978–982]. Evidence for this “efficient coding hypothesis” comes largely from neurophysiology and theoretical modeling [Olshausen, B. A., and Field, D. (2004). Curr. Opin. Neurobiol. 14, 481–487; DeWeese, M., et al. (2003). J. Neurosci. 23, 7940–7949; Klein, D. J., et al. (2003). EURASIP J. Appl. Signal Process. 7, 659–667]. The present research provides behavioral evidence for efficient coding in human auditory perception using six-channel noise-vocoded speech, which drastically limits spectral information and degrades recognition accuracy. Two experiments compared recognition accuracy of vocoder speech created using theoretically-motivated, efficient coding filterbanks derived from the statistical regularities of speech against recognition using standard cochleotopic (logarithmic) or linear filterbanks. Recognition of the speech created using efficient encoding filterbanks was significantly more accurate than either of the other classes. These findings suggest potential applications to cochlear implant design
    corecore